The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

نویسندگان

  • Valentin Amrhein
  • Fränzi Korner-Nievergelt
  • Tobias Roth
چکیده

The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p-values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be 'conflicting', meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effects of thera band and water resistance corrective exersises on the correction of flexible flat foot and postural control in boy students with overweight

Flat foot and postural control problems is one of the most common physical and motor disorders among students. The aim of this study was to comparison of the effects of comprehensive corrective thera band and water resistance training protocols on the correction of flexible flat foot and postural control in boy students with overweight. Material and Methods: In this quasi-experimental study, w...

متن کامل

Comparison of Indicators for Determining the Thresholds of Banks' Financial Crisis in EWS Based on Business Cycles

 The purpose of this paper is to design a prediction system for thresholds of the bankruptcy of banks based on the business cycle and examine the effects of different approaches in defining the bankruptcy threshold in predicting bankruptcy time of Iranian banks using the Kaplan-Meier and Cox Proportional-Hazards Models. So, the data of listed banks in Tehran Stock Exchange were used from 1385-1...

متن کامل

Effect of Fatigue on Knee Kinematics and Kinetics During Walking in Individuals With Flat Feet

Purpose: Flat feet associates with altered knee kinematics, kinetics, as well as knee pain. Fatigue of plantar intrinsic foot muscles may increase the navicular drop. However, it is unclear how fatigue influences the knee pain in individuals with flat feet. The purpose of this study was to assess the effect of fatigue on knee kinematics and kinetics in flat feet people during walking. Methods:...

متن کامل

Comparison of Plantar Force, Pressure and Impulse During Walking in Men and Women With Flat Feet

Objective: This study aims to compare the variables of plantar force, pressure and impulse during walking in men and women with flat feet. Methods: The study population consists of non-athlete students with and without flat feet. Of these, 48 (male and female) were selected as study samples. The peak pressure, force and impulse on the foot were measured during walking by a foot scanner at a sa...

متن کامل

Crisis Preparedness among Clinical Staff: A Brief Survey in an Iranian Context

Background and Objectives: During crisis, hospitals have great responsibility in saving life and protect health of the damaged individuals. Fulfilling this responsibility relies on preparedness of hospital staff, particularly the clinicians to face the relevant challenges. Given the lack of adequate information on the topic from Iran, the present study aimed to explore the technical crisis prep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2017